biostat_corefandomcom-20200215-history
Logistic Regression
Introduction The most common way of modeling associations with an outcome that is binary (Yes vs. No, Diseased vs. Healthy, etc.) is to use logistic regression. Commonly in publications you will see results from these models reported as odds ratios (OR). Odds ratios can be understood by using the following example: In this scenario, the odds ratio would be defined as: : \frac{D_E/D_N}{H_E/H_N} This is the ratio of Exposed to Not Exposed for those with the disease divided by the ratio of Exposed to Not Exposed for healthy individuals. If the odds ratio is greater than 1, this indicates that the exposure is associated with presence of the disease. Conversely, if the odds ratio is less than 1, the exposure is associated with not having the disease. Example The following example can be found in more detail here A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school. The response variable, admit/don’t admit, is a binary variable. First, let's look at a plot looking at GPA based on whether a student was admitted to graduate school: The red tick marks on the bottom correspond to the GPA of those who were not admitted, and the green tick marks at the top correspond to the GPA of those who were admitted. As we would expect, it appears that those with higher GPA are more likely to be admitted. Let's fit a model to predict the odds of being admitted, using GPA as a predictor. The model information is typically displayed in the following format: The numbers in the Estimate column correspond to Log '''odds ratios. The '''Intercept '''estimate is the log odds of being admitted for someone with a GPA = 0. The '''gpa estimateis the change in log odds of being admitted for a 1 unit increase in GPA. However, most people do not think in terms of log odds. When reporting logistic regression results, it is often preferable to report odds ratios. To get the odds ratio, take e and raise it to the power of the log odds ratio. In this example, for the GPA odds ratio we would get e^{1.0511} , which would be equal to 2.8608. This means that if we were comparing 2 people and one of them had a GPA that was 1 point higher, they would be expected to have 2.8608 times greater odds of being admitted compared to the other person. Another question we could answer from this model would be: What is the expected odds of being admitted for someone with a GPA of 3.5? To solve this, we would take the intercept term and add it to the GPA term multiplied by 3.5, then exponentiate the result. Mathematically it would be: e^{-4.3576 + (1.0511 * 3.5)} . The resulting odds would be 0.5073. It is important to note that odds are '''not '''the same thing as a probability. In this example: Odds(Admitted) = \frac{Probability(Admitted)}{Probability(Rejected)} Or equivalently: Odds(Admitted) = \frac{Probability(Admitted)}{1 - Probability(Admitted)} By rearranging the above terms, we can calculate the probability of being admitted . Probability(Admitted) = \frac{Odds(Admitted)}{1 + Odds(Admitted)} From our earlier example, we saw that a GPA of 3.5 would correspond to an odds of 0.5073. Putting that result into our formula, we can get the probability of being admitted. Probability(Admitted) = \frac{0.5073}{1 + 0.5073} = 0.3366 This means that for someone with a GPA of 3.5, they would have a predicted probability of 33.66% of being admitted. The curve in the graph below shows the predicted probability of being admitted at various GPA values.